首页> 外文OA文献 >A simple and fast method to determine the parameters for fuzzy c-means cluster validation
【2h】

A simple and fast method to determine the parameters for fuzzy c-means cluster validation

机译:一种简单快速的方法来确定模糊c均值的参数   集群验证

摘要

Fuzzy c-means clustering is widely used to identify cluster structures inhigh-dimensional data sets, such as those obtained in DNA microarray andquantitative proteomics experiments. One of its main limitations is the lack ofa computationally fast method to determine the two parameters fuzzifier andcluster number. Wrong parameter values may either lead to the inclusion ofpurely random fluctuations in the results or ignore potentially important data.The optimal solution has parameter values for which the clustering does notyield any results for a purely random data set but which detects clusterformation with maximum resolution on the edge of randomness. Estimation of theoptimal parameter values is achieved by evaluation of the results of theclustering procedure applied to randomized data sets. In this case, the optimalvalue of the fuzzifier follows common rules that depend only on the mainproperties of the data set. Taking the dimension of the set and the number ofobjects as input values instead of evaluating the entire data set allows us topropose a functional relationship determining its value directly. This resultspeaks strongly against setting the fuzzifier equal to 2 as typically done inmany previous studies. Validation indices are generally used for the estimationof the optimal number of clusters. A comparison shows that the minimum distancebetween the centroids provides results that are at least equivalent or betterthan those obtained by other computationally more expensive indices.
机译:模糊c均值聚类被广泛用于识别高维数据集中的簇结构,例如在DNA芯片和定量蛋白质组学实验中获得的那些。它的主要局限性之一是缺乏确定两个参数模糊化器和集群数的计算快速方法。错误的参数值可能导致结果中包含纯随机波动,或者忽略潜在的重要数据。最佳解决方案具有参数值,对于该参数值,聚类对于纯随机数据集不会产生任何结果,但会以最大的分辨率检测聚类形成。随机性的边缘。最佳参数值的估计是通过评估应用于随机数据集的聚类过程的结果来实现的。在这种情况下,模糊器的最佳值遵循仅取决于数据集主要属性的通用规则。以集合的维数和对象的数目为输入值,而不是评估整个数据集,可以使我们提出直接确定其值的函数关系。该结果强烈反对将模糊器设置为等于2,这在许多先前的研究中通常都是如此。验证指标通常用于估计最佳群集数。比较表明,质心之间的最小距离提供的结果至少等于或优于其他计算上更昂贵的索引所获得的结果。

著录项

  • 作者单位
  • 年度 2010
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号